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Summary 


The Software Implemented Fault Tolerance 
(SIFT) computer system was developed for NASA 
by SRI International as an experimental vehicle for 
fault-tolerant systems research. SIFT was delivered 
to the Langley Avionics Integration Research Labo- 
ratory (AIRLAB) in April 1982. Development and 
testing have continued at the NASA Langley Re- 
search Center, and several versions of the operating 
system have evolved. Each new version represents 
the different strategies employed to improve the per- 
formance of particular functions of the operating sys- 
tem. The three versions discussed in this paper are 

Version B as delivered (baseline) 

Version R improved reconfiguration performance 

Version V improved vote and reconfiguration 

performance 

The SIFT operating system is a fully distributed, 
real-time executive with no master controller. The 
operating system is implemented in Pascal and per- 
forms the following major functions: 

1. Periodic task scheduling and dispatching 

2. Data communication and voting 

3. Clock synchronization 

4. Fault isolation 

5. Reconfiguration 

6. Interactive consistency 

These operating system functions fall into two cate- 
gories: Local Executive and Global Executive. The 
Local Executive performs functions local to an indi- 
vidual processor, i.e., (1) and (2) above. The Global 
Executive is a set of tasks which assume the respon- 
sibility of items (3) through (6). 

The SIFT operating system utilizes significant 
resources to achieve fault tolerance in software. This 
overhead falls in two main categories: 

1. The time required to vote the intertask communi- 
cation variables at the beginning of each subframe 

2. The time utilized by the executive tasks 

The overhead from the first category, vote over- 
head, was found to be a linear function of the amount 
of data to be voted. The following table gives a ba- 
sic comparison of the vote times (best values) for 
the versions of SIFT investigated with a six-processor 
configuration: 


Version 

Three-way vote 
time per buffer, 
ms 

Five-way vote 
time per buffer, 
ms 

B 

0.413 

0.412 

R 

.302 

.357 

V 

.079 

.107 


The vote times were found to vary with the number of 
processors in the configuration and the location of the 
task replicates in the schedule table. These variations 
were typically on the order of 10 to 25 percent. 

The overhead due to the second category, the 
executive task overhead, is given below. 


Version 

Overhead, 

ms 

Frame size, 
ms 

Overhead, 
percent of 
frame size 

B 

60.8 

83.2 

73.2 

R 

32.0 

51.2 

62.5 

V 

28.8 

44.8 

64.3 


The reduced major frame size for Versions R and 
V arises because the improved vote performance en- 
ables some tasks to be scheduled in fewer subframes. 

The SIFT computer system requires significant 
overhead to achieve its fault tolerance. The voting 
and interactive consistency functions were found to 
be the primary sources of operating system overhead. 
Unfortunately, these functions seem to be inherently 
expensive when implemented in software. 

Introduction 

The Software Implemented Fault Tolerance 
(SIFT) computer system was developed for NASA 
by SRI International as an experimental vehicle for 
fault-tolerant systems research. The SIFT effort be- 
gan with broad, in-depth studies stating the reliabil- 
ity and processing requirements for digital comput- 
ers which would, in the next generation of aircraft, 
control flight-critical functions. (See refs. 1 and 2.) 
Detailed design studies were made of fault-tolerant 
architectures which could meet the required reliabil- 
ity and processing requirements. (See refs. 3 and 4.) 
Following these studies, SRI International and the 
Bendix Corporation designed and built the SIFT 
system, which was delivered to the Langley Avion- 
ics Integration Research Laboratory (AIRLAB) in 
April 1982. The SIFT architecture consists of a fully 
distributed configuration of Bendix BDX-930 proces- 
sors with a point-to-point communication link be- 
tween every pair of processors. (See fig. 1.) Although 
the design can accommodate up to eight processors, 




only six processors are in the current system; relia- 
bility estimations have demonstrated that this is ad- 
equate to meet the stated goal of a probability of 
failure of less than 10“ 9 for a 10-hour flight. 



Figure 1. SIFT system interconnection. 


The basic attributes of fault-tolerant computers 
are 

1. Redundant hardware and tasks are used. 

2. Errors caused by hardware faults are masked by 
voting the redundant outputs. 

3. To increase reliability, faulty hardware is removed 
from the system by means of reconfiguration. 

Important distinctions between SIFT and other 
fault-tolerant computers are 

1. The functions supporting fault tolerance (e.g., 
voting) are primarily implemented in software. 

2. Different tasks can be replicated to different levels 
(i.e., a noncritical task may be simplex, whereas 
more critical tasks can be replicated three-fold or 
five- fold). 

3. The unit of reconfiguration is a complete com- 
puter, i.e., processor, memory, and busses. 

4. The design is not based on a special central pro- 
cessing unit (CPU) or memory design. 

5. The redundant computers are loosely synchro- 
nized. 

The assignment of tasks to processors in SIFT 
is predetermined by a task schedule table, which is 
constructed by the application designer. The SIFT 
scheduler periodically dispatches tasks according to 
the task schedule. As processors fail, the hardware 
complement changes. Therefore, the application de- 
signer must define a task schedule for each level of 
configuration the system may encounter. Reconfigu- 
ration in SIFT is essentially accomplished by select- 
ing the appropriate task schedule. The decision to 


reconfigure is based on error information gathered 
when the data from replicate tasks are voted. 

The synchronization of the computers is funda- 
mental to the correct functioning of the exact match 
vote algorithm and communication system. Synchro- 
nization insures that all replicates of a task receive 
the same input data and therefore produce the same 
output data if fault free. Interprocessor communi- 
cation is completely asynchronous. No handshake 
signals or rendezvous mechanisms are used. The va- 
lidity of data is guaranteed by the precedence estab- 
lished in the task schedule and the synchronization 
of the processors. 

The SIFT operating system has two levels of au- 
thority. The Local Executive contains procedures 
which support scheduling, voting, and communi- 
cations. The Global Executive consists of tasks 
which cooperate to provide synchronization and re- 
dundancy management (fault isolation and reconfig- 
uration). Since the delivery of SIFT, development 
and testing have continued at the NASA Langley Re- 
search Center, and several versions of the operating 
system have evolved. Each new version represents 
the different strategies employed to improve the per- 
formance of particular functions of the operating sys- 
tem. The three versions discussed in this paper are 

Version B as delivered (baseline) 

Version R improved reconfiguration performance 

Version V improved vote and reconfiguration 
performance 

Version B only ran by disabling the clock inter- 
rupt during certain executive tasks. This was unac- 
ceptable because the disabling of interrupts resulted 
in significant delays in the output of the periodic ap- 
plication tasks. The primary problem was the large 
overhead of the Reconfiguration Task. Therefore, the 
operating system was redesigned. This new version 
is referred to as Version R. Version R is able to sup- 
port reasonable task schedules without disabling the 
clock interrupts during certain executive tasks. Fi- 
nally, the vote system was redesigned to improve the 
vote performance. This version is referred to as Ver- 
sion V. Version V was obtained by enhancement of 
Version R. Before the results of performance mea- 
surements on each version are discussed, an expla- 
nation of pertinent hardware and operating system 
internals is presented. 

Hardware Configuration 

The SIFT processors are Bendix BDX-930 avion- 
ics computers which communicate via a fully con- 
nected point-to-point broadcast network. Although 
the interconnection network and operating system 
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Figure 2. SIFT processor block diagram. 


are able to support up to eight processors, only six 
processors are currently used in SIFT. Each com- 
puter in the system has a 16-bit CPU, 32K words 
of static random access memory (RAM), IK datafile 
memory, IK transaction file memory, a broadcast 
controller, a 1553A controller, and a real-time clock. 
(See fig. 2.) The CPU is constructed from four 
2901 bit slice chips in a microprogrammed pipeline 
architecture and achieves a performance level of 
1 million instructions per second. The 1553A con- 
troller provides a MIL-STD-1553A bus interface for 
communication with external aircraft systems. 

Datafile 

The datafile is a IK memory block and serves as 
buffer area for the broadcast and 1553 A controllers. 
The datafile is partitioned into eight 128-word sec- 
tions. The first seven sections function as input 
“mailboxes.” The other section serves as an output 
buffer. Each input data stream from the broadcast 
network is hardwired to a specific mailbox to main- 
tain communication isolation. 

To broadcast a value, the value is first stored in 
the datafile output section. To start the broadcast, 
the location of the value is loaded into the transaction 
pointer register. The broadcast transmitter signals 
completion after 14.7/is. This allows for worst-case 
contention for the receiving datafile. 

Real-Time Clock 

The real-time clock is a read/write register which 
produces interrupts at 1.6-ms intervals. The clock is 
a 16-bit counter that is driven by the 16-MHz crystal 
in the CPU. The clock is therefore synchronized 
exactly to the fetch-execute cycle of the CPU. The 
least significant bit of the clock has a value of 1.6 fis. 


Operating System Overview 

The SIFT operating system is a fully distributed, 
real-time executive with no master controller. The 
operating system is implemented in Pascal and per- 
forms the following major functions: 

1. Periodic task scheduling and dispatching 

2. Data communication and voting 

3. Clock synchronization 

4. Fault isolation 

5. Reconfiguration 

6. Interactive consistency 

These operating system functions fall into two 
categories: Local Executive and Global Executive. 
The Local Executive performs functions local to an 
individual processor, i.e., (1) and (2) above. The 
Global Executive is a set of tasks which assume the 
responsibility of items (3) through (6). The major 
distinction between the Local and Global Executives 
is that the Global Executive tasks exchange data 
and cooperate on a systemwide basis, whereas the 
Local Executive procedures do not exchange data or 
cooperate. 

The primary purpose of the SIFT operating sys- 
tem is to provide fault tolerance through masking of 
errors by voting of replicated data from the redun- 
dant tasks executing on separate computers. The 
voting is exact match (i.e., bit-by-bit comparison) 
and therefore requires that the replicated processes 
use exactly the same input data and produce their 
outputs prior to the vote. System coordination is 
achieved by use of a decentralized clock synchroniza- 
tion algorithm (ref. 4) and a simple communication 
protocol relying on the synchronization provided by 
this algorithm. Basically, the communication proto- 
col requires that a data-producing task broadcast its 
data at some preagreed time T and that the data- 
receiving tasks wait until at least time T + Maxi- 
mum broadcast time + Maximum clock skew before 
reading. Communication within SIFT is therefore 
critically dependent upon the correct performance of 
the clock synchronization algorithm. Data generated 
by a task are available to other tasks only after the 
termination of the data-producing task. The oper- 
ating system allows the application system designer 
to define up to 128 “data buffers” for each proces- 
sor’s mailbox. Each of these “data buffers” consists 
of one word in the datafile area. An application task 
broadcasts its output data by calling a Local Execu- 
tive procedure. The receiving task must be scheduled 
after termination of the last replicate of the data- 
producing task. Prior to execution of the receiving 
task, the Local Executive votes the replicates of the 
data and places the majority value in a “post vote” 
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array. To retrieve the voted data, the receiving task 
calls a Local Executive function. Several operating 
system data structures must be initialized by the ap- 
plication designer to control the functions which co- 
operate in performing the communications. 

The Local Executive also keeps a count of every 
vote disagreement and the identity of the nonagree- 
ing processors. This error information is distributed 
and analyzed by the Global Executive. From this 
analysis, the Global Executive decides whether a re- 
configuration should take place. Although this func- 
tion is transparent to the application tasks at run- 
time, the application designer must initialize sev- 
eral data structures to preplan the reconfiguration 
process. 

Local Executive Data Structures 

In this section, the functions performed by the 
Local Executive are described. Because the SIFT 
operating system is driven by static data structures, 
the descriptions center around these data structures. 

The Local Executive has two main responsibil- 
ities: (1) scheduling and dispatching of tasks and 
(2) data communication and voting. The following 
data structures are used by the Local Executive: 

1. Task schedule 

2. Task table 

3. Buffer information (BINF) array 

4. Buffer table (BT) array 

5. Vote schedule 

6. POSTVOTE array 

The data linkages between these structures are 
illustrated in figure 3. Only the POSTVOTE array 


TASK SCHEDULE TASK TABLE TASK 



Figure 3. Linkages between SIFT data structures. 


is constructed completely by the operating system. 
All the other structures require some, if not all, of 
their information to be entered by the application 
designer in BDX-930 assembly code. 

Task schedule . Task scheduling in SIFT is non- 
preemptive and based on precalculated schedule ta- 
bles. The schedule table defines the set of tasks 
which will be periodically dispatched. This period 
is called a major frame and is partitioned into 3.2-ms 
subframes. Tasks are statically allocated to these 
subframe slots. Therefore, all task execution times 
must be less than or equal to 3.2 ms. If an appli- 
cation requires more time, it must be decomposed 
into a sequence of 3.2-ms tasks. In Version R and 
Version V, the operating system was generalized to 
allow a task to utilize any number of 1.6-ms intervals. 
In all versions, the application designer must allocate 
the tasks in a preplanned schedule table. 

Figure 4 shows a typical task assignment to the 
subframes for a six-processor configuration. In this 
schedule, the application designer has defined a ma- 
jor frame containing 32 subframes. 

The tasks are named by three-character identi- 
fiers (e.g., IC1, IC2, MLT, LAT). The application 
tasks are scheduled three times in the major frame. 
This is referred to as a “triple-frame” schedule. All 
the executive tasks except the Interactive Consis- 
tency Tasks are scheduled once per major frame. In 
a “single-frame” schedule, the application tasks are 
only scheduled once in the table. In such a schedule, 
the frame is shorter, but the operating system over- 
head is proportionately larger, as seen subsequently. 

Since the SIFT system is reconfigurable, there 
must also be schedules for five-, four-, three-, and 
two-processor configurations (not shown). Although 
the Local Executive executable code and schedule ta- 
ble are identical on every processor, each executive 
uses a different section of the schedule table. Each 
section of the schedule table is identified by the or- 
dered pair (NW,VPN), where the NW field indicates 
the number of working processors in the configura- 
tion, and the VPN field is the number of the virtual 
processor which uses this section of the schedule. Ev- 
ery physical processor has a virtual processor number 
assigned to it. After a reconfiguration, this virtual 
processor number may change. Since any processor 
may fail, the new virtual processor number cannot 
be predetermined. Thus, each processor contains all 
the schedule table sections. The schedule table then 
consists of a sequence of these sections which are ini- 
tialized in BDX-930 assembly code. 

Task table . The task table contains information 
specific to each task in the system. The following 
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PROCESSOR 



SUBFRAME 

1 

2 

3 

4 

5 

6 

0 

IC1 

IC1 

IC1 







1 

IC2 

IC2 

IC2 

IC2 

— 

— 

2 

IC3 

IC3 

IC3 

IC3 

IC3 

IC3 

3 

MLT 

— 

MLT 

MLT 

MLT 

MLT 

4 

— 

GUT 

GUT 

GUT 

GUT 

GUT 

5 

PIT 

PIT 

— 

PIT 

PIT 

PIT 

6 

LAT 

— 

LAT 

LAT 

LAT 

LAT 

7 

— 

— 

— 

— 

— 

— 

8 

— 

— 

— 

— 

— 

— 

9 

IC1 

IC1 

IC1 

— 

— 

— 

10 

IC2 

IC2 

IC2 

IC2 

— 

— 

11 

IC3 

IC3 

IC3 

IC3 

IC3 

IC3 

12 

MLT 

— 

MLT 

MLT 

MLT 

MLT 

13 

— 

GUT 

GUT 

GUT 

GUT 

GUT 

14 

PIT 

PIT 

— 

PIT 

PIT 

PIT 

15 

LAT 

— 

LAT 

LAT 

LAT 

LAT 

16 

— 

— 

— 

— 

— 

— 

17 

— 

— 

— 

— 

— 

— 

18 

IC1 

IC1 

IC1 

— 

— 

— 

19 

IC2 

IC2 

IC2 

IC2 

— 

— 

20 

IC3 

IC3 

IC3 

IC3 

IC3 

IC3 

21 

MLT 

— 

MLT 

MLT 

MLT 

MLT 

22 

— 

GUT 

GUT 

GUT 

GUT 

GUT 

23 

PIT 

PIT 

— 

PIT 

PIT 

PIT 

24 

LAT 

— 

LAT 

LAT 

LAT 

LAT 

25 

— 

— 

— 

— 

— 

— 

26 

— 

— 

— 

— 

— 

— 

27 

ERT 

ERT 

ERT 

ERT 

ERT 

ERT 

28 

FIT 

FIT 

FIT 

FIT 

— 

FIT 

29 

RET 

RET 

RET 

RET 

RET 

RET 

30 

— 

— 

— 

— 

— 

— 

31 

where 

CLT 

CLT 

CLT 

CLT 

CLT 

CLT 


IC1 , IC2 , IC3 - Interactive Consistency Tasks 

MLT, GUT, PIT, LAT - Application Tasks 

ERT - Error Task 

FIT - Fault Isolation Ta3k 

RET - Reconfiguration Task 

CLT - Clock Task 

Figure 4. Typical task assignment in SIFT. 

Pascal record defines its structure: 

TT: ARRAY [TASK] OF RECORD 

CAUSE : (TASKTERM . CLOCKINT , SYSTEMSTART) ; 
BUFS: INTEGER; 

ERRORS: INTEGER; 

STKPTR : INTEGER; 

STATE: ARRAY [0. . 128] OF INTEGER; 

END; 

Most of these fields are initialized and managed by 
the operating system. Only the BUFS field and the 
initial state of the task must be initialized by the 
applications programmer. The BUFS field points to 
a list of the buffer numbers in the Buffer information 
array (described below). This list defines the output 
variables of the task. The initial state holds the task 
starting location, terminating routine location, and 
initial register values. Other fields in the task table 
record are used by the scheduler. They contain the 
following information: 

CAUSE the reason for entry into the 
scheduler 


ERRORS 

the number of times the task failed 
to complete 

STATE 

the state of the task (including 
registers, restart address, and stack 
area) upon interrupt 

STKPTR 

the value of the stack pointer 
register upon task termination or 
clock interrupt 

Buffer information array. Before the Buffer 
information (BINF) array can be constructed, the 
application designer must enter in BDX-930 assembly 
code a list of EQU instructions identifying each buffer 
“name” with a buffer “number” 

ERRER 

GEREC 

EQU 33 
EQU 34 


These buffer names are simply convenient syn- 
onyms for the buffer numbers. The BINF array is 
essentially a memory pool containing lists of buffer 
names which are pointed to by the BUFS field of the 
task table. Each list is terminated by a zero field. 
The details of the assembly code which initializes this 
structure are not important for this discussion. This 
list is used in the Reconfiguration Task to rebuild the 
BT array, described next. 

Buffer table . The buffer table, BT, is used by the 
Local Executive and generated by the Global Execu- 
tive. The BT array is the central data structure used 
by the system for redundancy management. This 
structure maps the logical buffer names to physical 
datafile locations. Since SIFT uses replicated tasks to 
achieve fault tolerance, each data value (i.e., buffer) 
is calculated by several tasks and is broadcast to spe- 
cific locations in the datafile of each processor. The 
BT array maintains the datafile locations of all the 
replicates of the data. 

The BT array, shown below, is essentially a func- 
tion mapping a buffer number into a vector indicating 
where its replicated values reside. 

BT: ARRAY [0. .MAXBUFS] OF RECORD 

DBX: INTEGER; 

AD: ARRAY [0. . MAXPROCESSOR] OF INTEGER; 

END; 

The datafile offset (DBX) must be entered by* the 
application designer. This offset describes the loca- 
tion of the data within the 128-word mailbox. For 
example, buffer number 10 might have an offset of 
8. Two different buffer numbers may be assigned the 
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same datafile offset (DBX), but the application de- 
signer must be careful not to utilize both at the same 
time — i.e., they must be time multiplexed. The abil- 
ity to time multiplex datafile locations was sacrificed 
in Version R to enable an efficient reconfiguration 
process. The AD array is constructed by the Recon- 
figuration Task from information in the task sched- 
ule, task table, and BINF array and from data from 
the Fault Isolation Task. If processor i computes the 
buffer, then AD[i] contains the location of processor 
i output in the datafile. If processor i does not com- 
pute the buffer, then AD[i] = — 1. For example, the 
BT array entry relating to the processor 2 datafile is 
shown in figure 5. Buffer item 10 is found at an offset 
of 8 into the processor mailbox and was produced by 
processors 0, 2, 3, and 5. 


DBX: 

8 

AD[0]s 

8 

AD[1]s 

-1 

AD[2]s 

90 1 ) 

AD[3] : 

264 

AD[4]: 

-1 

AD[5] : 

520 

AD[6]s 

-1 

AD[7] : 

“1 


Figure 5. Detailed description of BT[10]. 


Vote schedule and POSTVOTE array . The 
vote schedule is constructed in parallel with the task 
schedule. The vote schedule in figure 6 corresponds 
to the task schedule of figure 4. Unlike the task 
schedule, there is only one vote schedule for each 
level of configuration (i.e., all processors in the con- 
figuration use the same vote schedule). The sched- 
ule contains a list of items to be voted before the 
task scheduled for that subframe is executed. The 
result of this vote is placed in the POSTVOTE ar- 
ray. The restriction of allowing only one vote sched- 
ule for each configuration level guarantees that all 
good processors contain exactly the same data in the 
POSTVOTE buffers — even if their schedules do not 
execute tasks which use all the data. Although at 
first this appears wasteful, it simplifies the reconfig- 
uration process. Since all data are available on every 
processor during reconfiguration, it is not necessary 


SUBFRAME BUFFERS TO BE VOTED 


0 


1 

EXPEC 




2 

NDR 




3 

LOCK 

XRESE 



4 

QX 

QY 

QZ 


5 

PSIN 

PHIN 

RN 

QDELY QLATM TIMER 

6 

CMDEL 

QDELZ 

CMDTH 

QPITM 

7 

CMDAI 

CMDRU 



8 





9 





10 

EXPEC 




11 

NDR 




12 

LOCK 

XRESE 



13 

QX 

QY 

QZ 


14 

PSIN 

PHIN 

RN 

QDELY QLATM TIMER 

15 

CMDEL 

QDELZ 

CMDTH 

QPITM 

16 

CMDAI 

CMDRU 



17 





18 





19 

EXPEC 




20 

NDR 




21 

LOCK 

XRESE 



22 

QX 

QY 

QZ 


23 

PSIN 

PHIN 

RN 

QDELY QLATM TIMER 

24 

CMDEL 

QDELZ 

CMDTH 

QPITM 

25 

CMDAI 

CMDRU 




26 

27 

28 

29 GEREC GEMEM 

30 

31 


Figure 6. Typical SIFT vote schedule. 


to transfer data to a processor when its new schedule 
contains a task it previously had not executed. 

Global Executive 

The Global Executive performs four major func- 
tions: 

1. Clock synchronization 

2. Error report analysis and identification of faulty 
processors 

3. Logical removal of a faulty processor via reconfig- 
uration 

4. Interactive consistency 

These functions are performed by a set of tasks — 
the Clock Synchronization Task, the Error Task, the 
Fault Isolation Task, the Reconfiguration Task, and 
the Interactive Consistency Task. The details of the 
synchronization process are not presented here. 

When a processor fails, the immediate effect of 
its errors is masked by the vote function. The voter 
records the number of errors produced by each pro- 
cessor. Once during each major frame, the Error 
Task condenses the local error data and broadcasts 
the information. All processors now have a sys- 
temwide record of the errors produced during the 
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VAR 

ERAD AT 16#H000: ARRAY[1 . .6,1 . .MAXSUBFRAME, 1 . .8] OF INTEGER; 

ERADPT: INTEGER; 

GLOBAL FUNCTION SCHEDULER ( CAUSE :SCHED_CALL; STATE : INTEGER ) s INTEGER; 

VAR T1 ,1: INTEGER; 

BEGIN 

TSKFN := CLOCK; (# - PERFORMANCE MONITOR - *) 

TT[TASKID] .STKPTR := STATE; 

IF CAUSEOTASKTERMINATION THEN (* CLOCK INTERRUPT *) 

BEGIN 

IF (TASKIDONULLT) THEN (* TASK DID NOT COMPLETE *) 

BEGIN TT[TASKID]. ERRORS := TT[TASKID] .ERRORS + 1; 

BUILDTASK(TASKID) ; 

END 

ELSE TT[TASKID]. STATUS := CLOCKINTERRUPT; 

IF SFCOUNT >= MAXSUBFRAME THEN (* START NEW MAJOR FRAME *) 

BEGIN SFCOUNT := 0; 

IF FRAMECOUNT >= MAXFRAME THEN FRAMECOUNT := 0 
ELSE FRAMECOUNT := FRAMECOUNT+1 ; GFRAME := GFRAME+1 ; 

IF ERADPT = 6 THEN ERADPT := 1 (* - PERFORMANCE MONITOR - *) 

ELSE ERADPT := ERADPT +1; (» - PERFORMANCE MONITOR - *) 

END 

ELSE SFCOUNT := SFCOUNT+1 ; 

TSCHEDULE; (* SELECT NEW TASK *) 


BCLOCK := CLOCK; 



(* - 

PERFORMANCE 

MONITOR - 

*) 

VSCHEDULE; 


(* PERFORM 

VOTE *) 



BCLOCK CLOCK - BCLOCK; 


(* " 

PERFORMANCE 

MONITOR - 

*) 

ERADCERADPT , SFCOUNT , 1 ] 

s 

GFRAME; 

(* - 

PERFORMANCE 

MONITOR - 

*) 

ERAD [ERADPT, SFCOUNT ,2] 

= 

SFCOUNT; 

(* ~ 

PERFORMANCE 

MONITOR - 

*) 

ERADCERADPT , SFCOUNT , 3] 

= 

TSKFN ; 

(* " 

PERFORMANCE 

MONITOR - 

*) 

ERADCERADPT, SFCOUNT , H ] 

ss 

BCLOCK; 

(* ~ 

PERFORMANCE 

MONITOR - 

*) 

ERADCERADPT , SFCOUNT ,5] 

= 

RCLOCK; 

(* " 

PERFORMANCE 

MONITOR - 

*) 

ERADCERADPT .SFCOUNT ,6] 

= 

XCLOCK; 

(* - 

PERFORMANCE 

MONITOR - 

*) 

ERADCERADPT , SFCOUNT ,7] 


0; 

(* - 

PERFORMANCE 

MONITOR - 

*) 

ERADCERADPT, SFCOUNT, 8] 

= 

0; 

(* - 

PERFORMANCE 

MONITOR - 

*) 

END 







ELSE 


(* 

TASK 

TERMINATION 

*) 


BEGIN T1 := TSKFN - TSKST; 


(* 

- PERFORMANCE MONITOR - 

■ *) 

IF T1 > TTIMECTASKID] THEN 

(* 

- PERFORMANCE MONITOR - 

- *) 

TTIMECTASKID] := T1; 



(* 

- PERFORMANCE MONITOR - 

■ *) 

ERADCERADPT, SFCOUNT, 7] 

= 

TSKST; 

(* 

- PERFORMANCE MONITOR - 

- *) 

ERAD [ ERADPT , SFCOUNT , 8 ] 

= 

TTIMECTASKID]; 

(* 

- PERFORMANCE MONITOR - 

■ *) 


TASKID := NULLT; 

END; 

SCHEDULER := TT[TASKID] .STKPTR; 

TSKST := CLOCK; (# - PERFORMANCE MONITOR - *) 

END; (# SCHEDULER *) 


Figure 7. SCHEDULER procedure with instrumentation. 
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past frame. The Fault Isolation Task searches the 
error data to locate faulty processors. The Fault Iso- 
lation Task then broadcasts a value indicating which 
processors are faulty. This value is voted and used 
by the Reconfiguration Task to compute the set of 
“good” processors. The Reconfiguration Task does 
not physically remove a faulty processor from the 
configuration. The good processors merely agree to 
ignore outputs from the faulty processor. Thus, re- 
configuration may be accomplished by changes to the 
internal data structures. 

The Reconfiguration Task basically selects a new 
schedule table and regenerates the buffer table (de- 
scribed previously) to accomplish the logical recon- 
figuration. Since the algorithm used to determine 
which processor must be eliminated is decentralized, 
it is essential that the algorithm is designed so ev- 
ery good processor makes exactly the same decision 
at exactly the same time. This must be done cor- 
rectly even in the presence of a malicious processor 
(i.e., one that sends good data to some processors 
and erroneous data to others). This is accomplished 
by use of the “interactive consistency algorithm” de- 
veloped by SRI International (ref. 4) and discussed 
in the section “Interactive Consistency overhead.” 

SIFT Scheduler 

The scheduler consists of two major components — 
the assembly code interrupt handler and the Pascal 
procedure SCHEDULER. The Pascal procedure is 
called from assembly code whenever any one of the 
following three events occurs: (1) system startup, 
(2) task termination, or (3) clock interrupt. The 
SCHEDULER has two primary responsibilities after 
a clock interrupt — vote the data scheduled for the 
subframe and dispatch the next task according to 
the information in the schedule table. 

Under Version B of the SIFT Operating System, 
an application designer has to divide a process that 
takes longer than 3.2 ms into a series of 3.2-ms tasks. 
An entry must be made in the schedule table for each 
subframe in which the process would run. To spare 
the designer the work of partitioning the process 
and to reduce the size of the schedule table, the 
structure of the schedule table was changed to allow 
the designer to define how many 1.6-ms interrupts 
the task should use. 

Subsystem Overhead Measurement 

Instrumentation of Operating System 

Since the SCHEDULER controls voting and the 
dispatching of tasks (and therefore the Global Execu- 
tive), the measurement instrumentation was added to 


the SCHEDULER. The SCHEDULER with the mea- 
surement code is shown in figure 7. An array, ERAD, 
was added to store the data during a test. ERAD is a 
three-dimensional array with indices ERADPT, SF- 
COUNT, and DVI, which differentiate the 6 major 
frames, 32 subframes, and 8 data items, respectively. 
To retrieve the performance data, the SIFT proces- 
sors were allowed to run an arbitrary amount of time 
(5 to 10 s) and then were halted manually from the 
host processor. The portion of the processors’ memo- 
ries containing the ERAD array was copied to a disk 
file and analyzed by offline programs. The eight data 
items are as follows: 


GFRAME 

the major frame count (nonrepeat- 
ing count) 

SFCOUNT 

the subframe count (0..MAXSUB- 
FRAME) 

TSKFN 

the time the previous task finished 

BCLOCK 

the vote time for the subframe 

RCLOCK 

not used 

XCLOCK 

not used 

TSKST 

the time at which the task for the 
subframe started 

TIME 

the maximum task execution time 


Figure 8 shows the components of a subframe. 
All the variance in execution time results from either 
the voter or the task itself. The time required for 
dispatching a task was calculated by subtracting the 
vote time from the total time spent in the SCHED- 
ULER routine. This overhead then includes the time 
needed for the instrumentation. Figure 8 shows the 
task schedule overhead to be nominally 270 /zs. An 
analysis of the voter and the Global Executive tasks 
follows. 


Begin subframe 

on clock tick New subframe 



- Schedule 
task , 

Perform vole 

Application task 

Idle ' 

■1 


0.27 ms 

^ 0.04 to 2.5 ms 1 

0.3 to 2.4 ms 




Figure 8. Components of a subframe. 


Vote Overhead 

Voting is performed at the beginning of every 
subframe prior to the execution of the application 
task. The operating system scans the vote schedule 
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table to determine which data buffers are to be voted. 
For each such data buffer, the VOTE routine is 
called. The VOTE routine uses the BT array to 
retrieve the replicated versions of the data from the 
datafile. After the data are retrieved, either VOTE3 
or VOTE5 is called for three-way voting or five- 
way voting, respectively, depending on the number 
of values found. Unfortunately, the time required for 
voting is affected by the following four factors: 

1. US', the type of vote — three-way or five- way (de- 
termined by the number of task replicates which 
generate data) 

2. NV, the number of data buffers to be voted, as 
indicated in the vote schedule 

3. HP, the position of the data-producing tasks in 
the schedule table 

4. NW, the number of working processors in the 
configuration 

The characteristics of the vote time are different 
for each of the operating system versions. These vote 
times will be referred to as V B , Vr, and Vy . Thus, 
the vote time for each of these versions is effectively 
a function defined on a four-dimensional space as 
follows: 

V B (VS,NV,HP,NW) 

V r (VS,NV,HP,NW) 

Vy(VS, NV, HP, NW) 

Obviously, it is difficult to clearly present the 
details of an empirical function defined on a four- 
dimensional space. In this paper, this is attempted 
by illustrating different planes in the four- 
dimensional space. 


Version B vote times . First, looking only at 
a six-processor configuration (AW = 6) and as- 
signing the highest task replicate to processor 6 
(HP = 6), we obtain the graph in figure 9 for 
V B (5,NV,6,6). Surprisingly, the vote time for the 
three-way vote, V B (3,NV,6,6) is virtually identical 
to that of the five-way vote. A single five-way vote 
requires 0.412 ms, whereas a single three-way vote re- 
quires 0.413 ms. (See fig. 10.) Although the VOTE3 
routine does execute faster, the code which retrieves 
data from the datafile (see fig. 11) continues until ei- 
ther all eight fields of the BT array are examined or 
five good values (i.e., not —1) are found. 



Number of Data Values Voted (NV) 

Figure 9. Five-way vote times V B (b,NV,6,6). 


2.6 



Number of Data Values Voted (NV) 


Figure 10. Three-way vote times for V B (3, NV, x,6), where 
x€{3,4,...,8}. 

Thus, when there are only three data-producing 
tasks, the loop does not terminate until “^MAX- 
PROCESSOR.” (MAXPROCESSOR is 7.) This ad- 
ditional looping time is almost exactly equal to the 
difference between the VOTE3 and VOTES execu- 
tion times. If five-way voting is not required on any 
tasks, then the loop test can be changed to “UNTIL 
(J=3) OR (I>MAXPROCESSOR) .” With this mod- 
ification, the system will be referred to as “TRIAD 
SIFT.” The three-way vote in TRIAD SIFT will be 
indicated by appending an * after the subscript letter 
denoting the version (e.g., V B *). The V B *(3,NV ,6,6) 
vote times for the TRIAD SIFT are compared with 
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Vote Time (ms 


VAR BT: ARRAY[0. .MAXBUFFERS] OF RECORD 

DBX: INTEGER; 

AD: ARRAY[0. .MAXPROCESSOR] OF INTEGER; 
END; 

PROCEDURE VOTE(B: BUFFER; DEFAULT: INTEGER); 

VAR I,J,K: INTEGER; 

BEGIN 

J := 0; I :« 0; 

REPEAT 

K BT[B].AD[I]; 

IF K >- 0 THEN 
BEGIN 

J J + 1 ; 

P[J] := I; 

V[J] DATAFILE[K] ; 

END; 

I s- I + 1; 

UNTIL (J-5) OR ( I>MAXPROCESSOR ) ; 


(* DATAFILE ADDRESS OF BUFFER B *) 

(* SAVE PROCESSOR NUMBER *) 

(* RETRIEVE DATA FROM DATAFILE *) 

(* UNTIL 5 VALUES FOUND OR SEARCH DONE *) 


(* CALL APPROPRIATE VOTE FUNCTION; i.e. 5~WAY OR 3"WAY *) 


END; 


Figure 11. Vote procedure in Version B. 



Number of Data Values Voted (NV) 


V b (3,NV ( x,6) 

x € (3,4 8j 

V Bt (3,NV,6,6) 

(TRIAD SIFT) 


Figure 12. Vote times for three-way SIFT and TRIAD SIFT. 


the normal SIFT vote times in figure 12. From this 
graph, the advantage of the above software modifi- 
cation when only three-fold redundancy is needed is 
clearly seen. 

Because the vote time is a linear function of NV, 
the following formula is valid: 

V B { VS , NV ; HP , AW) = C B { VS, HP, AW) • NV+B 

where C B ( VS, HP,NW) is the slope of the line and 
B is the y-intercept. The notation C B essentially rep- 
resents the vote time per buffer and will be referred 
to as the vote cost. The symbol B represents the 
basic overhead when no buffers are voted. It is in- 
dependent of the other parameters of V B and always 
has the value 0.038 ms. 

Next, the impact of the position of the tasks in the 
schedule table on the vote time is illustrated. In a six- 
processor configuration, there are six ways to assign 
the five task replicates to the processors. However, 
the only factor which influences the vote time is 
the highest numbered processor which is running the 
task; i.e., in all the following five assignments of task 
tl, processor 6 is the highest processor executing the 
task, and thus all five assignments have the same vote 
time. 


Processor number 

1 

2 

3 

4 

5 

6 


tl 

tl 

tl 

tl 

tl 

n 


tl 

tl 

tl 

tl 

ti 

tl 


tl 

tl 

tl 

ti 

tl 

tl 


tl 

tl 

ti 

tl 

tl 

tl 


tl 


For each of the above task assignments, HP = 6 . 
For the remaining assignment 


Processor number 

i 

2 

3 

4 

5 

6 

ti 

tl 

tl 

tl 

tl 



the highest processor is 5; thus, HP = 5. The vote 
costs C b (5, HP, 6), C b (3, HP, 6), and C B *{3,HP,6) 
are given as a function of HP in figure 13. (Note 
that HP is the physical processor number, which is 
derived from the slot position on the broadcast bus. 
Therefore, even if the number of working processors 
is less than eight, HP can be made equal to 8 by 
placing one of the good processors in slot 8.) The 
vote times in Version B are independent of AW. 
As can be seen in figure 13, the vote costs can be 


dependent on HP and therefore on the task schedule 
constructed by the application designer. This is not 
a desirable attribute, since a task which runs within 
its time limit in one schedule may not have enough 
time to run if the schedule is slightly modified. 

It is noteworthy that the HP dependency arises 
from the (J=5) test of the data retrieval loop. By 
simply removing this test from the Boolean ex- 
pression, this dependency can be eliminated. The 
vote time would then always be worst case (i.e., 
Cb( 5,MAXPROCESSOR, 6)), but the verification 
process would be simpler. 

Version R vote times . In Version R, a minor mod- 
ification was made to the data retrieval loop code. 
(See fig. 14.) This modification reduced the vote time 
in certain cases but introduced the additional com- 
plexity that the vote time depends on NW. The 
data retrieval loop in Version R differs from that 
in Version B in two significant aspects. First, the 
loop termination logic now refers to JVW rather than 
MAXPROCESSOR. Second, the variable I refers to 
virtual processor number rather than physical pro- 
cessor number as in Version B. This is a consequence 
of the new design. The BT array in Version R is 
indexed by the number of working processors, NW, 
and buffer number, B. (See the section on reconfig- 
uration overhead.) The BT array in Version B was 
indexed by buffer number alone. Thus, whereas the 
Version B vote times were influenced by the high- 
est physical-processor-producing data, the Version R 
vote times are influenced by the highest virtual pro- 
cessor number. Thus in this section, HP will be- 
come HVP, which denotes the highest virtual pro- 
cessor which produces the data being voted. In fig- 
ure 15, the vote costs Cr(3,3,NW), C r (5,5,NW), 
and Ctf (5, 6, 6) are given as a function of the number 
of working processors, AW. The value of C R (5, 6, 6) 
is greater than that of C R ( 5, 5, 6) because an extra 
BT array access is needed. The three-way vote cost 
dependency on AW occurs because the BT array 
search loop terminates when I becomes greater than 
AW. In figure 16, the effect of HVP is illustrated. 
The three-way vote cost C R ( 3, HVP, NW) is seen to 
be independent of HVP. However, as can be seen in 
figure 17, the TRIAD SIFT three-way vote cost is de- 
pendent on the parameter HVP. In figure 18, TRIAD 
SIFT is compared with Version R SIFT. 

The NW dependency of this version arises from 
the (I>NW) test in the REPEAT-UNTIL loop. By 
returning this to the (I>MAXPROCESSOR) test, 
the AW dependency can be eliminated. Although 
the (I>NW) test results in a reduced vote time for 
smaller SIFT configurations (e.g., after several re- 
configurations), additional vote overhead complexity 
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C b (5,HP,6) 

C b (3,HP,6) 


C b *(3,HP,6) 


Figure 13. Vote costs for Cb( 5, i/P, 6), Cb(3, HP, 6), Cb*(3, HP, 6). 


VAR BT: ARRAY[0 . .MAXPROCESSOR ,0 . .MAXBUFFERS] OF INTEGER; 

PROCEDURE VOTE (B : BUFFER; DEFAULT: INTEGER); 

VAR I , J, K: INTEGER; 

BEGIN 

J : =* 0 ; I :« 0; 

K BT[NW,B] ; (* BIT VECTOR OF DATA PRODUCING PROCESSORS *) 

REPEAT 

IF ODD(K) THEN (* PROCESSOR I COMPUTES BUFFER *) 

BEGIN 

J J + 1; 

P[J] I; (* SAVE PROCESSOR NUMBER *) 

V[J] :» DATAFILE[VTODF[I]+B] ; (* RETRIEVE DATA FROM DATAFILE *) 

END; 

K :« K DIV 2; (* SHIFT NEXT PROCESSOR BIT TO LSB *) 

I I + 1; 

UNTIL (J-5) OR (I>NW) ; (* UNTIL 5 VALUES FOUND OR SEARCH DONE *) 


(* CALL APPROPRIATE VOTE FUNCTION; i.e. 5-WAY OR 3"WAY *) 


END; 

Figure 14. Vote procedure in Version R. 
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Figure 15. Vote costs of optimized system. 



: r (5,6,6) 

: r (5,5,N¥) 


r (3,3,N¥) 


C r (5,HVP,6) 


C r (3,HVP,6) 

C r (3,HVP,5) 

C r (3,HVP,4) 

C r (3,HVP,3) 


Figure 16. Vote costs for C R {5,HVP,6) and C R (3,HVP,NW). 



Number of Processors in System (NW) 


HIGHEST VIRTUAL 
PROCESSOR IN 
VOTE (HVP) 


6 

5 

4 

3 


Figure 17. Vote costs for C Rt { 3, HVP, NW ) of TRIAD SIFT. 



C r (5,HVP,6) 


Figure 18. Vote costs for C R {5,HVP,6), C R (3,HVP,6), C R ,{3, HVP, 6). 


VAR BT: ARRAY[0. .MAXPROCESSOR ,0. .TASKS] OF INTEGER; 

PROCEDURE VOTE(TK: TASK; DEFAULT: INTEGER); 

VAR I , J ,K: INTEGER; 

B: BUFFER; 

BEGIN 

J := 0; I := 0; 

K := BT[NW,TK] (* 

REPEAT 

IF ODD(K) THEN (* 

BEGIN 

J : = J + 1 ; 

P[J] := I; (* 

DF[ J] := VTODF[I] ; (* 

END; 

K : - K DIV 2; (* 

I I + 1; 

UNTIL ( J=5) OR (I>NW) ; 

I := TT[TK].BUFS? (* RETRIEVE INDEX OF FIRST BUFFER OF TASK *) 

B := BINF[I ] ; (* RETRIEVE BUFFER NUMBER *) 

WHILE B > 0 DO 
BEGIN 

• 

. (* CALL APPROPRIATE VOTE PROCEDURE, i.e. 5-WAY OR 3-WAY *) 

I := I + 1; 

B := BINFCI] (* RETRIEVE ‘NEXT BUFFER NUMBER *) 

END; 


BIT VECTOR OF DATA-PRODUCING PROCESSORS *) 
PROCESSOR I EXECUTED TASK TK *) 

SAVE VIRTUAL PROCESSOR NUMBER *) 

SAVE DATAFILE OFFSET *) 

SHIFT NEXT PROCESSOR BIT TO LSB *) 


Figure 19. Vote procedure in Version V. 



is also obtained. This overhead complexity signif- 
icantly increases the effort to validate the system, 
since one must insure that adequate time is present 
for all the tasks to run under all possible configu- 
rations and all possible schedules that the system 
may encounter. Perhaps the workload requirements 
of the degraded configurations will require this re- 
duced overhead (i.e., for small values of AW), since 
processing power is scarce, but a serious price is paid 
for this during the validation effort. 

Version V vote times . In Version V, the explicit 
purpose of the operating system modification was to 
decrease vote costs. An analysis of the VOTE pro- 
cedure showed that a great deal of time was spent 
indexing into the BT array for each buffer. After the 
modifications of Version R, the information in the BT 
array was reduced to a bit vector representing the 
set of processors which produces the buffer. Since 
the set of processors that runs a task which com- 
putes a buffer is equivalent to the set of processors 
that produces the buffer, the approach taken was to 
modify the BT array so the VOTE procedure could 
manipulate a task instead of a buffer. (See fig. 19.) 

The BT array was modified to be indexed by 
task rather than by buffer number, and the vote 
schedule was changed to a list of task names instead 
of buffer numbers. As seen in figure 20, although 
Version V pays an initial penalty and actually takes 
longer when voting one buffer, it shows significant 
improvement over Version R (and therefore also over 
Version B) for more than one buffer. In Versions B 
and R, the basic overhead time B is a constant, but 
in Version V, all the configuration dependency is 
in the basic overhead. Thus, the following formula 
describes the Version V vote overhead Vy\ 

V v = C v { VS) • NV + By {VS , HVP, NW ) 

Since By is no longer a constant overhead, it will 
be referred to as vote bias. Figure 21 shows this 
vote bias relationship to VS, HVP, and AW . By 
is dependent on NW, as shown by the solid lines 
B v (5,HVP=NW,NW) and B v {3,x,NW). By 
is only dependent on HVP for NW = 6 in a five- 
way vote. For TRIAD SIFT, By * is independent 
of NW and only dependent on HVP , as shown by 
the dashed lines By*{3, x, NW). The By term only 
contains the data retrieval overhead. The actual time 
spent voting is represented by Cy and is dependent 
only on the type of vote done. For a three-way vote, 
Cy( 3) = 0.079 ms. A five- way vote has a cost, Cy{ 5), 
of 0.107 ms. 

The redesign of the vote system moved the depen- 
dencies on AW and HVP to the vote bias. However, 


TABLE I. ERROR IMPACT ON THREE-WAY VOTE 


Faulty 

Increase in 

processor(s) 

vote time, ms 

i 

0.039 

2 

.031 

3 

.031 

1,2 

.096 


TABLE II. ERROR IMPACT ON FIVE- WAY VOTE 


Faulty 
processor (s) 

Increase in 
vote time, ms 

i 

0.040 

2 

.032 

3 

.032 

4 

.032 

5 

.032 

1,2 

.079 

1,3 

.072 

1,4 

.072 

1,5 

.079 

2,3 

.071 

2,4 

.063 

2,5 

.063 

3,4 

.063 

3,5 

.061 

4,5 

.061 

1,2,3 

.072 

1,2,4 

.064 

1,2,5 

.064 

1,3,4 

.064 

1,3,5 

.064 

1,4,5 

.072 

2,3,4 

.064 

2,3,5 

.064 

2,4,5 

.064 

3,4,5 

.064 


as in the other versions, this dependency can be elim- 
inated by removing the (I>NW) and (J=5) tests 
from the UNTIL loop. 

Error impact on vote time . The vote time mea- 
surements given in the previous sections were made 
only when the data replicates were identical. Un- 
fortunately, the vote time is increased if some of 
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Figure 20. Comparison of V v and V R vote times. 



B V (5,6,NW) =-345 
B V (5,5,NW) =.328 

B y , (3,6,6) =.296 

B Vt (3,5,NW) =.278 
B Vf (3,4,NW) = 261 
B V? (3,3,NW) =.245 


Figure 21. Vote bias dependency on NW and HVP in Vy. 


the data values are erroneous. Furthermore, the in- 
crease is slightly dependent on the particular pro- 
cessor which generated the erroneous data because 
of the structure of the IF. .THEN.. ELSE statements 
in the VOTE procedure. The increases in vote time 
are given in table I and table II. From the tables 
it is obvious that the application designer must be 
very careful to insure that the worst-case vote time 
is accommodated when generating the vote and task 
schedule tables. 

Executive Task Overhead 

Reconfiguration overhead . To maintain a high 
level of reliability, the SIFT Global Executive re- 
moves faulty processors from the system, or recon- 
figures. The process is divided into three tasks. 
The Error Task transfers the local error data to the 
Global Executive via an error report. The Fault 
Isolation Task uses the error report from each pro- 
cessor to locate faulty processors. The Reconfigura- 
tion Task determines if a reconfiguration is necessary. 
During normal operation, each Global Executive task 
uses one 3.2-ms task slot. If a fault occurs and re- 
sults in a reconfiguration, the Reconfiguration Task 
utilizes resources significantly in excess of 3.2 ms. 

The exact time for reconfiguration depends on the 
number of working processors, but the worst case 
was found in Version B to be 35.19 ms or 11 sub- 
frames. (See fig. 22.) Since the scheduling is static, 
it must be based on worst-case performance, and 
hence 11 subframes must be dedicated to the Re- 
configuration Task, even though the vast proportion 
of time they are not being utilized. However, in 
Version B, rather than dedicate 11 subframes, the 
real-time clock interrupt was disabled during the Re- 
configuration Task. This allowed the task to take as 
much time as necessary, even though it was allocated 
to only one subframe. This was clearly unaccept- 
able, since a serious disruption of output data would 
occur during the reconfiguration process. Therefore, 
the SIFT Reconfiguration Task was redesigned at the 
NASA Langley Research Center. 

As stated above, the worst-case time for a recon- 
figuration is equivalent to eleven 3.2-ms subframes 
under Version B. Most of this time is spent recon- 
structing the BT array. The reconstruction is needed 
because the physical to virtual processor mapping 
changes after every reconfiguration. This mapping is 
t necessary because the voter operates on physical pro- 
cessors, whereas the task schedules (and therefore, 
the processor data production information) are nec- 
essarily built in reference to virtual processors. The 
Version R design enables the Reconfiguration Task 
to execute in one 3.2-ms subframe. 
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Figure 22. Reconfiguration times for Versions B and R. 


The major points of the Version R redesign are 

1. A buffer number defines the location of the buffer 
within the processor mailbox; i.e., each buffer 
item is assigned a unique mailbox address. 

2. Array VTODF maps the virtual processor num- 
ber to the corresponding physical processor mail- 
box offset in the datafile. 

3. Arrays RTOV and VTOR map real to virtual 
processor numbers and vice versa, respectively. 

4. The BT array is restructured to hold the virtual 
processor data production information for every 
configuration level. 

5. The voter operates on virtual processors, and 
hence the ERROR array is represented in terms 
of virtual processors. 

6. The Error Task translates the ERROR array in- 
formation to physical processor numbers while 
constructing the error report. 

A minor loss in generality comes about from 
point (1) above. Under Version B, two or more 
buffers could be assigned to the same mailbox lo- 
cation. This would be necessary if, for example, the 
system required more than 128 data buffers. This 
condition is not conceptually restrictive, since the 
datafile could be enlarged to 32K words (or more!) 
to allow 4000 buffers per processor. The BT array 
under Version R of the SIFT Operating System now 
has the form 

BT : ARRAY [PRO CESS OR, BUFFER] OF INTEGER; 

For a configuration level of NW processors and 
buffer B, BT[NW,B] contains a bit map of the vir- 
tual processors that produce buffer B, Under Ver- 
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sion R, the BT array is filled during system initial- 
ization from task schedule information. The BT then 
contains valid information for all levels of system con- 
figuration. The Reconfiguration Task now builds ar- 
rays VTOR, RTOV, and VTODF. During a recon- 
figuration, since the BT array no longer needs to be 
rebuilt, the only chore is to select the proper task 
and vote schedules. The execution time for the Re- 
configuration Task was reduced to about 2.5 ms for 
all levels of initial configuration. 


Interactive Consistency overhead . Data values 
from the external environment are unreplicated and 
must be transferred to all computers of the system 
in a consistent manner; i.e., all computers must re- 
ceive the same value. This could be accomplished 
by every processor reading the external source inde- 
pendently or by one processor reading the external 
source and then distributing the obtained value to 
the rest of the processors. In the first case, each 
processor might get a different value because of the 
inherent uncertainty of reading analog data. Hence, 
a subsequent exchange of the values read, along with 
a midvalue selection, is required to produce a value 
which is consistent across all processors. However, 
if one of the processors is malicious, i.e., sends dif- 
ferent values to different processors, then the good 
processors can still end up with different values. (See 
fig. 23.) The second method will produce similar er- 
roneous results if the single “input” processor is ma- 
licious. Note that although the good processors each 
decide on a slightly different value, they are both 
“good” in that the difference is only the slight differ- 
ence in the redundant external sources. A midvalue 
selection on the replicated output channels would al- 
ways result in a “good” output. However, if exact 
match voting is used to detect and isolate the fault, 
then serious problems can result, e.g., a good proces- 
sor can be reconfigured out of the system. Thus, spe- 
cial “interactive consistency” algorithms are essential 
in fault-tolerant systems in which fault isolation and 
reconfiguration are performed. (See refs. 5 and 6.) 
In systems in which fault-masking is performed but 
no reconfiguration is attempted, such algorithms are 
unnecessary. 
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Figure 23. Distributing unreplicated data. 


The overhead for these special interactive consis- 
tency algorithms can be very large. This overhead is 
especially severe because the failure modes that the 
algorithms eliminate may be rare events. In the ab- 
sence of a thorough analysis demonstrating that the 
probability of these failure modes is negligible, inter- 
active consistency algorithms must be used. In order 
to accommodate m faulty processors, the total num- 
ber of processors, n, must be at least 3m + 1. The 
number of messages required to obtain interactive 
consistency is on the order of n m+1 . Although five- 
way voting, which can deal with two internal faults, 
is supported in SIFT, the interactive consistency al- 
gorithm that was implemented can only handle one 
malicious fault. The simple flight control applica- 
tions currently running in SIFT use 63 external sen- 
sor values, each of which goes through the interactive 
consistency algorithm. This requires 11.8 ms when 
no disagreements occur in the data. With faulty data 
present, the Interactive Consistency Tasks can utilize 
up to 13.4 ms. 

The interactive consistency algorithm consists of 
the following steps: 

1. The source value is input and distributed to the 
n processors. 

2. The received values are exchanged m times. 

3. A consistent value is obtained by use of a recur- 
sive algorithm. When m = 1, this reduces to 
determining a majority value. 

The following execution times were measured for 
SIFT: 

Step (1) 3.05 ms 
Step (2) 2.22 ms 
Step (3) 6.57 ms 
Total 11.84 ms 

Since the Interactive Consistency Tasks must be 
executed at the data sample rate, a large portion of 
the available CPU time is consumed, as shown in the 
table below. 


Data sample period, 

Utilization, 

ms 

percent 

100 

11.8 

50 

23.7 

33 

35.9 

25 

47.4 


Execution times of executive tasks . The max- 
imum execution times of the SIFT executive tasks 
were measured and are tabulated on page 20. The 
dispatch time represents the amount of time utilized 
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by the operating system prior to dispatching the executive task. This includes the vote time for this 
subframe. Therefore, the dispatch overhead is strongly dependent on the vote schedule. It should be 
noted that some variables voted during a particular subframe are not necessarily used by that task. The 
execution time column gives the time used by the executive task after being dispatched. The total time 
column is the sum of the dispatch time and execution time columns. A single frame is a major frame that 
contains one iteration of the sample application set. 

A triple frame contains three iterations of the application set and therefore requires three iterations of 
the Interactive Consistency Tasks. The Global Executive Tasks are dispatched once every major frame. 
In particular, the Reconfiguration Task is executed at the major frame rate. Preliminary design studies 
recommended a major frame period of 100 ms in order to achieve the reliability requirements of SIFT. 

Under Version B, the application set of four tasks utilizes seven 3.2-ms subframes. The executive 
overhead then is 


Version B subsystem 

Dispatch 

time, 

ms 

Execution 

time, 

ms 

Total 

time, 

ms 

No. of 
3.2-ms 
subframes 

Interactive Consistency 

1.7 

11.8 

13.5 

5 

Error Task 

.3 

.3 

.6 

1 

Fault Isolation Task 

.3 

2.4 

2.7 

1 

Clock Synchronization 

.3 

2.4 

2.7 

1 

Reconfiguration Task 

1.1 

34.1 

35.2 

11 
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19 subframes = 60.8 ms = 73.2% of an 83.2-ms single frame 
29 subframes = 92.8 ms = 58.0% of a 160.0-ms triple frame 


Under Version R, the application set of four tasks utilizes twelve 1.6-ms subframes. The executive 
overhead then is 


Version R subsystem 

Dispatch 

time, 

ms 

Execution 

time, 

ms 

Total 

time, 

ms 

No. of 
1.6-ms 
subframes 

Interactive Consistency 

1.4 

11.8 

13.2 

10 

Error Task 

.9 

.3 

1.2 

2 

Fault Isolation Task 

1.4 

2.4 

3.8 

3 

Clock Synchronization 

.3 

2.4 

2.7 

2 

Reconfiguration Task 

.9 

2.4 

3.3 

_3 
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20 subframes = 32.0 ms = 62.5% of a 51.2-ms single frame 
40 subframes = 64.0 ms = 52.6% of a 121.6-ms triple frame 


Under Version V, the application set of four tasks utilizes ten 1.6-ms subframes. The executive overhead 
then is 


Version V subsystem 

Dispatch 

time, 

ms 

Execution 

time, 

ms 

Total 

time, 

ms 

No. of 
1.6-ms 
subframes 

Interactive Consistency 

1.3 

11.8 

13.1 

10 

Error Task 

.7 

.3 

1.0 

2 

Fault Isolation Task 

.7 

2.4 

3.1 

2 

Clock Synchronization 

.3 

2.4 

2.7 

2 

Reconfiguration Task 

.7 

2.4 

3.1 

2 

18 


20 


18 subframes = 28.8 ms = 64.3% of a 44.8-ms single frame 
38 subframes = 60.8 ms = 55.9% of a 108.8-ms triple frame 






The overhead improvement in the subsequent ver- 
sions of the operating system is readily seen in the 
decrease in the length of a triple frame. The de- 
crease from Version B to Version R (i.e., 160.0 ms 
to 121.6 ms) is a result of the improved Reconfigu- 
ration Task. The further decrease from Version R to 
Version V (i.e., 121.6 ms to 108.8 ms) resulted from 
the decrease in vote time and the consequent ability 
to squeeze several tasks into one less subframe each. 
The higher percentage overhead of Version V results 
from the smaller major frame size and not from any 
increased inefficiency. 

The impact of the fault tolerance mechanisms 
on performance can be seen by comparison with an 
equivalent simplex system. The overhead of such a 
simplex system is easily calculated from the available 
data. Without voting, the dispatch overhead would 
be about 270 /xs, or less than 10 percent of a 3.2-ms 
subframe. The time needed to execute the four sam- 
ple application tasks would be approximately four 
subframes, or 12.8 ms. A communications task, the 
equivalent of Interactive Consistency Task 1 (ICl), 
would still be needed and would require at most one 
3.2-ms subframe. The executive task overhead would 
then be 20 percent of a small 16.0-ms major frame or 
3.2 percent of a larger 100-ms major frame. 


Discussion of Results 

The SIFT operating system utilizes significant 
CPU resources to achieve fault tolerance in software. 
This overhead falls in two main categories: 

1. The time required to vote the intertask communi- 
cation variables at the beginning of each subframe 

2. The time utilized by the executive tasks, espe- 
cially the Interactive Consistency Tasks 

The overhead from the first category, vote over- 
head, was found to be a linear function of the amount 
of data to be voted. Unfortunately, for as few as 
six data buffers, the vote overhead was in excess of 
30 percent of a 3.2-ms subframe. The vote times were 
measured for three versions of the operating system. 
Also, a slight modification to the VOTE routine was 
discovered which enables a more efficient three-way 
vote if the system is run with only three-way voting, 
i.e., no five- way replicated tasks. This modification is 
referred to as TRIAD SIFT. The following table gives 
a basic comparison of the vote times (best values) for 
all versions of SIFT investigated with a six-processor 
configuration: 


Version 

Three-way vote 
time per 
buffer, ms 

Five- way vote 
time per 
buffer, ms 

B 

0.413 

0.412 

TRIAD-B 

.352 


R 

.302 

.357 

TRIAD-R 

.247 


V 

°.079 

6 .107 

TRIAD-V 

°.079 



a Does not include 0.245-ms initial overhead. 
fc Does not include 0.328-ms initial overhead. 


The vote times were found to vary with the number of 
processors in the configuration and the location of the 
task replicates in the schedule table. These variations 
were typically on the order of 10 to 25 percent. 

The overhead due to the second category, the ex- 
ecutive task overhead, is given below. The Interac- 
tive Consistency Tasks were the major contributors 
in this category and accounted for 56 percent of the 
executive task overhead in the optimized Version V. 
The overhead when a single frame is scheduled is 


Version 

Overhead, 

ms 

Frame size, 
ms 

Overhead, 
percent of 
frame size 

B 

60.8 

83.2 

73.2 

R 

32.0 

51.2 

62.5 

V 

28.8 

44.8 

64.3 


The reduced major frame size for Versions R and 
V arises because the improved vote performance en- 
ables some tasks to be scheduled in fewer subframes. 
The overhead for a triple frame is 


Version 

Overhead, 

ms 

Frame size, 
ms 

Overhead, 
percent of 
frame size 

B 

92.8 

160.0 

58.0 

R 

64.0 

121.6 

52.6 

V 

60.8 

108.8 

55.9 


Concluding Remarks 

The Software Implemented Fault Tolerance 
(SIFT) computer system requires significant over- 
head to achieve its fault tolerance. Several versions 
of SIFT — Versions B, R, and V — evolved as improve- 
ments were made to reduce this overhead. Version B 
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is the original delivered version of SIFT. This ver- 
sion only runs by disabling the clock interrupt func- 
tion while many of the executive tasks are executing. 
This disabling of interrupts is unacceptable, since it 
seriously disrupts the cyclic output of the application 
tasks. To eliminate this problem, the system was re- 
designed to produce Version R. A drastic reduction 
in the Reconfiguration Task overhead was obtained, 
and feasible schedules were constructed without dis- 
abling of the interrupts. Finally, a redesign of the 
vote subsystem resulted in Version V. 

The voting and interactive consistency functions 
were found to be the primary sources of operat- 
ing system overhead. Unfortunately, these functions 
seem to be inherently expensive when implemented 
in software. Several modifications were made to the 
vote subsystem, but only moderate improvements 
were obtained. Even in the improved system, with 
as few as six input variables, the five- way vote time 
can consume over 30 percent of a 3.2-ms subframe. 
The Interactive Consistency Tasks require 13.1 ms 
for every iteration of the applications task set (Ver- 
sion V). The Interactive Consistency Tasks along 
with the other Global Executive Tasks consume at 
least 55.9 percent of each major frame. By contrast, 
a simplex system with no voting or redundancy man- 
agement would use less than 10 percent of each sub- 
frame during scheduling. If a single communications 
task similar to Interactive Consistency Task 1 (IC1) 
is utilized, the executive task overhead is reduced 
to about 20 percent of a single 16.0-ms major frame 
or 3.2 percent of a 100-ms major frame. The fault- 
tolerant requirements of the SIFT system produce an 
overhead at least 3 times that of conventional sys- 
tems. There appears to be little hope for improve- 
ment of these figures without additional hardware 
support. 

The vote time dependency on the schedule table 
and on the number of working processors in the 
system is a serious obstacle to validation. One 
must be careful that sufficient processing time is 
allocated to a task to cover all possible configurations 
of SIFT in which the task may run. The validation 


problem is further compounded because erroneous 
data also increase the vote time. The marginal 
increase in performance gained by adding software 
shortcuts is offset by the increased effort required 
for validation. By designing the vote system so 
the vote time is constant and independent of the 
schedule table or number of working processors, the 
complexity is reduced and the multiplicity of test 
modes is eliminated. Of course, the vote time is then 
always worst case. 


NASA Langley Research Center 
Hampton, VA 23665 
December 10, 1984 
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